-
Notifications
You must be signed in to change notification settings - Fork 82
Avoid quadratic complexity in poll() #155
base: master
Are you sure you want to change the base?
Conversation
The issue is that poll is needed to advance the state of each in-flight. Otherwise, they later ones can't make any progress until the prior ones complete. That being said, the current implementation is not ideal. the best way to handle this is to use The reason I didn't just do it is that it requires extra allocations + book keeping. So, you only want to switch to the more complex handling when there are enough in-flight requests to make it worth it. |
Yes, I understand why it may be desirable to trigger in-flight objects as early as possible, in case they can do some processing in the background.
This also fixes the issue I encountered, while still triggering more than just the first in-flight entry. But the number |
Here is the situation: Request 1 comes in, and sleeps for 30 seconds. Anyway, the point is that the correct fix, as mentioned, is to only poll futures for which we know 100% need to be polled. To do this, we need to use Also, I'd say that there should be a default max in-flight setting of 100. |
Some arbitrary limit like 10 or 100 would mitigate the issue, for sure. With an echo server on a unix domain socket, I saw more than 1.000.000 entries in the in-flight queue, given enough input. Perhaps that's even the core issue. And if the queue length is limited to some sane value, it doesn't matter to much if it's iterated O(n) or O(n^2) times. |
Again, the correct way to solve the original issue would be to use |
Incorrect in the sense that it may be (possibly much) slower, right? Or does it break some contracts / invariants which I don't know about? (Just trying to understand how the crate works, now...) |
streaming::pipeline::server::Dispatch::poll() called poll on every single entry of self.in_flight, and then removed at most one element from that structure (via pop_front()). So, the complexity was O(n^2) in the size of self.in_flight. Instead, call poll only once for a new in_flight entry, and then only when an unpark event was received.
What do you think about this patch? |
About the test failure on nightly, shown here: https://travis-ci.org/tokio-rs/tokio-proto/jobs/213217871
Seems to be a known bug: #114 |
Sorry for the delay. Re: "Incorrect", I would say it is a bit of both. The pipeline behavior is to enable processing futures concurrently. Given that futures are supposed to be lazy (not do any work until they are polled), not polling means that this behavior isn't followed. Re: the new PR, I did a quick skim, but I'm going to need to read it more closely before I provide feedback :) |
At a glance, the implementation seems plausible. I'm not really following why It would also be nice to have a minimum number of in_flight requests before switching to Also, it seems to me that this functionality could be extracted into something that can be reused... I'm not sure if that is worth it yet. |
for slot in self.in_flight.iter_mut() { | ||
slot.poll(); | ||
// new Futures must be polled once to start processing | ||
if self.polled_to < self.in_flight.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial poll could probably be handled before inserting it into the in_flight
structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, could be easily moved to fn dispatch, a few lines up in the same file.
I just wasn't sure if that would break any non-obvious assumption, so I preferred to keep the poll at the same place as before, for now.
If we first start calling poll without
(Updated comment: The proposed strategy should be possible, as we never switch from 'poll unconditionally' to 'use |
I did some simple benchmarks to find out what could be a reasonable threshold to start using with_unpark_event (the X in #155 (comment)). This is not an exact value, at all. The optimum seems to be somewhere in the range 32 to about 300, so a larger value could be used, as well. But because, for a more expensive poll function, lower values would quickly become better choices, I'd tend to the lower end of that range. For reference, this is the test code I used to do the benchmark:
On the (rather old) notebook I tested it on, the test took about 7 seconds when using |
I actually started looking more into |
welp, this entire topic drove me to shave some yaks: rust-lang/futures-rs#436 |
Just a quick update. I didn't forget this PR. As mentioned above, I think the right way to handle it is to extract the needed logic into a stand alone type in futures-rs. However, that sent me in to the yak shave of improving the futures-rs task system... Once we get out of that, I will try to extract the logic that is generally useful from this PR into futures-rs. |
streaming::pipeline::server::Dispatch::poll() called poll on every
single entry of self.in_flight, and then removed at most one
element from that structure (via pop_front()).
So, the complexity is O(n^2) in the size of self.in_flight.
Calling poll() on the first entry of self.in_flight should be
sufficient, the other entries can be handled later. This reduces the
complexity to O(n).
This behavior can be significant if self.in_flight gets large, which can
happen if input is read very quickly. I triggered it with a simple
echo server fed from a unix domain socket:
$ time yes | head -n 1000 | nc -U -N /tmp/sock >/dev/null
real 0m0.051s
$ time yes | head -n 10000 | nc -U -N /tmp/sock >/dev/null
real 0m3.534s
$ time yes | head -n 20000 | nc -U -N /tmp/sock >/dev/null
real 0m13.883s
With this patch, the runtime becomes linear and much faster:
$ time yes | head -n 1000 | nc -U -N /tmp/sock >/dev/null
real 0m0.010s
$ time yes | head -n 10000 | nc -U -N /tmp/sock >/dev/null
real 0m0.049s
$ time yes | head -n 20000 | nc -U -N /tmp/sock >/dev/null
real 0m0.084s
$ time yes | head -n 100000 | nc -U -N /tmp/sock >/dev/null
real 0m0.405s
$ time yes | head -n 1000000 | nc -U -N /tmp/sock >/dev/null
real 0m3.738s